Goto

Collaborating Authors

 pre-ranking model


AIF: Asynchronous Inference Framework for Cost-Effective Pre-Ranking

arXiv.org Artificial Intelligence

In industrial recommendation systems, pre-ranking models based on deep neural networks (DNNs) commonly adopt a sequential execution framework: feature fetching and model forward computation are triggered only after receiving candidates from the upstream retrieval stage. This design introduces inherent bottlenecks, including redundant computations of identical users/items and increased latency due to strictly sequential operations, which jointly constrain the model's capacity and system efficiency. To address these limitations, we propose the Asynchronous Inference Framework (AIF), a cost-effective computational architecture that decouples interaction-independent components, those operating within a single user or item, from real-time prediction. AIF reorganizes the model inference process by performing user-side computations in parallel with the retrieval stage and conducting item-side computations in a nearline manner. This means that interaction-independent components are calculated just once and completed before the real-time prediction phase of the pre-ranking stage. As a result, AIF enhances computational efficiency and reduces latency, freeing up resources to significantly improve the feature set and model architecture of interaction-independent components. Moreover, we delve into model design within the AIF framework, employing approximated methods for interaction-dependent components in online real-time predictions. By co-designing both the framework and the model, our solution achieves notable performance gains without significantly increasing computational and latency costs. This has enabled the successful deployment of AIF in the Taobao display advertising system.


Optimizing E-commerce Search: Toward a Generalizable and Rank-Consistent Pre-Ranking Model

arXiv.org Artificial Intelligence

Beyond these optimizations, meeting the system To enhance user experience and conversion efficiency, the online performance requirements presents a significant challenge. Contrasting search system is employed with a cascading architecture, mainly with existing industry works, we propose a novel method: a including recall and ranking. The ranking stage as the downstream Generalizable and RAnk-ConsistEnt Pre-Ranking Model (GRACE), component directly influences the efficiency of item sorting. Several which achieves: 1) Ranking consistency by introducing multiple superior ranking models have been identified in industrial research, binary classification tasks that predict whether a product is within such as MMoE [4], PLE [12], ESMM [5], DeepFM [1], DIN [18], the top-k results as estimated by the ranking model, which facilitates MIMN [8], SDIM [16], and SIM [12], with a focus on feature engineering, the addition of learning objectives on common point-wise behavioral sequence modeling, and objective function ranking models; 2) Generalizability through contrastive learning optimization. However, as the scale of products within the search of representation for all products by pre-training on a subset of system grows, there is an increasing demand for managing the ranking product embeddings; 3) Ease of implementation in feature time complexity of the sorting module.


Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models

arXiv.org Artificial Intelligence

Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.


COPR: Consistency-Oriented Pre-Ranking for Online Advertising

arXiv.org Artificial Intelligence

Cascading architecture has been widely adopted in large-scale advertising systems to balance efficiency and effectiveness. In this architecture, the pre-ranking model is expected to be a lightweight approximation of the ranking model, which handles more candidates with strict latency requirements. Due to the gap in model capacity, the pre-ranking and ranking models usually generate inconsistent ranked results, thus hurting the overall system effectiveness. The paradigm of score alignment is proposed to regularize their raw scores to be consistent. However, it suffers from inevitable alignment errors and error amplification by bids when applied in online advertising. To this end, we introduce a consistency-oriented pre-ranking framework for online advertising, which employs a chunk-based sampling module and a plug-and-play rank alignment module to explicitly optimize consistency of ECPM-ranked results. A $\Delta NDCG$-based weighting mechanism is adopted to better distinguish the importance of inter-chunk samples in optimization. Both online and offline experiments have validated the superiority of our framework. When deployed in Taobao display advertising system, it achieves an improvement of up to +12.3\% CTR and +5.6\% RPM.


Towards a Better Tradeoff between Effectiveness and Efficiency in Pre-Ranking: A Learnable Feature Selection based Approach

arXiv.org Artificial Intelligence

In real-world search, recommendation, and advertising systems, the multi-stage ranking architecture is commonly adopted. Such architecture usually consists of matching, pre-ranking, ranking, and re-ranking stages. In the pre-ranking stage, vector-product based models with representation-focused architecture are commonly adopted to account for system efficiency. However, it brings a significant loss to the effectiveness of the system. In this paper, a novel pre-ranking approach is proposed which supports complicated models with interaction-focused architecture. It achieves a better tradeoff between effectiveness and efficiency by utilizing the proposed learnable Feature Selection method based on feature Complexity and variational Dropout (FSCD). Evaluations in a real-world e-commerce sponsored search system for a search engine demonstrate that utilizing the proposed pre-ranking, the effectiveness of the system is significantly improved. Moreover, compared to the systems with conventional pre-ranking models, an identical amount of computational resource is consumed.